Generating Classifier Commitees by Stochastically Selecting both Attributes and Training Examples
نویسنده
چکیده
Boosting and Bagging, as two representative approaches to learning classiier committees, have demonstrated great success, especially for decision tree learning. They repeatedly build diierent classiiers using a base learning algorithm by changing the distribution of the training set. Sasc, as a diierent type of committee learning method, can also signiicantly reduce the error rate of decision trees. It generates classiier committees by stochastically modifying the set of attributes but keeping the distribution of the training set unchanged. It has been shown that Bagging and Sasc are, on average, less accurate than Boosting, but the performance of the former is more stable than that of the latter in terms of less frequently obtaining signiicantly higher error rates than the base learning algorithm. In this paper, we propose a novel committee learning algorithm, called SascBag, that combines Sasc and Bagging. It creates diierent classiiers by stochastically varying both the attribute set and the distribution of the training set. Experimental results in a representative collection of natural domains show that, for decision tree learning, the new algorithm is, on average, more accurate than Boosting, Bagging, and Sasc. It is more stable than Boosting. In addition, like Bagging and Sasc, SascBag is amenable to parallel and distributed processing while Boosting is not. This gives SascBag another advantage over Boosting for parallel machine learning and datamining.
منابع مشابه
Extending Bayesian Classifier with Ontological Attributes
The goal of inductive learning classification is to form generalizations from a set of training examples such that the classification accuracy on previously unobserved examples is maximized. Given a specific learning algorithm, it is obvious that its classification accuracy depends on the quality of training data. In learning from examples, noise is anything which obscures correlations between ...
متن کاملSelecting representative examples and attributes by a genetic algorithm
A nearest-neighbor classifier compares an unclassified object to a set of preclassified examples and assigns to it the class of the most similar of them (the object’s nearest neighbor). In some applications, many pre-classified examples are available and comparing the object to each of them is expensive. This motivates studies of methods to remove redundant and noisy examples. Another strand of...
متن کاملFisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection
Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...
متن کاملUsing Prediction from Sentential Scope to Build a Pseudo Co-Testing Learner for Event Extraction
Event extraction involves the identification of instances of a type of event, along with their attributes and participants. Developing a training corpus by annotating events in text is very labor intensive, and so selecting informative instances to annotate can save a great deal of manual work. We present an active learning (AL) strategy, pseudo co-testing, based on one view from a classifier a...
متن کاملConstructing Diverse Classifier Ensembles using Artificial Training Examples
Ensemble methods like bagging and boosting that combine the decisions of multiple hypotheses are some of the strongest existing machine learning methods. The diversity of the members of an ensemble is known to be an important factor in determining its generalization error. This paper presents a new method for generating ensembles that directly constructs diverse hypotheses using additional arti...
متن کامل